Search CORE

6 research outputs found

LCNN: Lookup-based Convolutional Neural Network

Author: Bagherinezhad Hessam
Farhadi Ali
Rastegari Mohammad
Publication venue
Publication date: 12/06/2017
Field of study

Porting state of the art deep learning algorithms to resource constrained compute platforms (e.g. VR, AR, wearables) is extremely challenging. We propose a fast, compact, and accurate model for convolutional neural networks that enables efficient learning and inference. We introduce LCNN, a lookup-based convolutional neural network that encodes convolutions by few lookups to a dictionary that is trained to cover the space of weights in CNNs. Training LCNN involves jointly learning a dictionary and a small set of linear combinations. The size of the dictionary naturally traces a spectrum of trade-offs between efficiency and accuracy. Our experimental results on ImageNet challenge show that LCNN can offer 3.2x speedup while achieving 55.1% top-1 accuracy using AlexNet architecture. Our fastest LCNN offers 37.6x speed up over AlexNet while maintaining 44.3% top-1 accuracy. LCNN not only offers dramatic speed ups at inference, but it also enables efficient training. In this paper, we show the benefits of LCNN in few-shot learning and few-iteration learning, two crucial aspects of on-device training of deep learning models.Comment: CVPR 1

arXiv.org e-Print Archive

Crossref

Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images

Author: Bagherinezhad Hessam
Farhadi Ali
Mottaghi Roozbeh
Rastegari Mohammad
Publication venue
Publication date: 12/11/2015
Field of study

In this paper, we study the challenging problem of predicting the dynamics of objects in static images. Given a query object in an image, our goal is to provide a physical understanding of the object in terms of the forces acting upon it and its long term motion as response to those forces. Direct and explicit estimation of the forces and the motion of objects from a single image is extremely challenging. We define intermediate physical abstractions called Newtonian scenarios and introduce Newtonian Neural Network (

N^3

) that learns to map a single image to a state in a Newtonian scenario. Our experimental evaluations show that our method can reliably predict dynamics of a query object from a single image. In addition, our approach can provide physical reasoning that supports the predicted dynamics in terms of velocity and force vectors. To spur research in this direction we compiled Visual Newtonian Dynamics (VIND) dataset that includes 6806 videos aligned with Newtonian scenarios represented using game engines, and 4516 still images with their ground truth dynamics

arXiv.org e-Print Archive

Crossref

Are Elephants Bigger than Butterflies? Reasoning about Sizes of Objects

Author: Bagherinezhad Hessam
Choi Yejin
Farhadi Ali
Hajishirzi Hannaneh
Publication venue
Publication date: 01/02/2016
Field of study

Human vision greatly benefits from the information about sizes of objects. The role of size in several visual reasoning tasks has been thoroughly explored in human perception and cognition. However, the impact of the information about sizes of objects is yet to be determined in AI. We postulate that this is mainly attributed to the lack of a comprehensive repository of size information. In this paper, we introduce a method to automatically infer object sizes, leveraging visual and textual information from web. By maximizing the joint likelihood of textual and visual observations, our method learns reliable relative size estimates, with no explicit human supervision. We introduce the relative size dataset and show that our method outperforms competitive textual and visual baselines in reasoning about size comparisons.Comment: To appear in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Towards Better Generalization: Model, Data, and Explicit Knowledge

Author: Bagherinezhad Hessam
Publication venue
Publication date: 01/01/2020
Field of study

Thesis (Ph.D.)--University of Washington, 2020In this dissertation, I explore three ways to make models more generalizable. 1) Through explicit knowledge extraction. Explicit knowledge enables models to correct their predictions, and in some cases to break a complex task into smaller pieces where each can be trained with less amount of data. 2) Through reducing model complexity. It is known that over- parameterized complex Convolutional Neural Networks (CNNs) often overfit to the given training set, and are therefore less generalizable. In this dissertation, I explore redesigning convolutional layers that outperform standard CNNs under few shot training scenario. 3) Through making labels more informative. I study the current data labeling paradigm, and present how labels for a simple image classification task are noisy. Noisy labels contribute to less generalizability. This is due to the fact that our over-parameterized models overfit to the noisy signal that is specific to that training set; therefore, they act poorly on an unseen test set. For explicit knowledge extraction, I first explore estimating and modeling Newtonian physics of a scene, and then explore extracting information about sizes of objects without any supervision required. For reducing model complexity, I explore redesigning Convolutional layers to reduce their complexity by sharing a dictionary of vectors among different convolutions. For label noise reduction, I explore making the training more accurate by refining the labels of a dataset with a dynamic label generator, called Label Refinery

DSpace at The University of Washington